Sakana AI introduces Reinforcement-Learned Teachers (RLTs), a novel method that trains smaller models to teach reasoning to large language models efficiently using reinforcement learning focused on generating step-by-step explanations.

READ →

#Reinforcement Learning26/05/2025

Microsoft and Tsinghua Introduce Reward Reasoning Models to Enhance LLM Judgement with Dynamic Compute Scaling

Microsoft and Tsinghua researchers propose Reward Reasoning Models that adaptively allocate compute resources during evaluation, significantly improving large language model judgment and alignment across complex tasks.

READ →

#Reinforcement Learning09/05/2025

Tsinghua University's 'Absolute Zero': Training AI Models Without External Data

Tsinghua University researchers developed the Absolute Zero paradigm to train large language models without external data, using a self-evolving code executor system to enhance AI reasoning and learning.

READ →

#Reinforcement Learning23/04/2025

Revolutionizing LLMs: Self-Evolving Language Models Learn Without Labels Using Test-Time Reinforcement Learning

Researchers from Tsinghua University and Shanghai AI Lab introduce TTRL, a novel method allowing large language models to improve their performance without labeled data by leveraging self-generated pseudo-rewards during inference.

READ →